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The  problem  of  estimating  the  variance  parameter  robustly  in  a 
heteroscedatic  linear  model  is  considered.  The  situation  where  the 
variance  is  a  function  of  the  explanatory  variables  is  treated.  To 
estimate  the  variance  robustly  in  this  case,  it  is  necessary  to  guard 
against  the  influence  of  outliers  in  the  design  as  well  as  outliers  in 
the  response.  By  analogy  with  the  homoscedastic  regression  case, 
two  estimators  are  proposed  which  do  this.  Their  performance  is 
evaluated  on  a  number  of  data  sets.  We  had  considerable  success  with 
estimators  that  bound  the  "self-influence",  that  is,  the  influence 
an  observation  has  on  its  own  fitted  value.  We  conjecture  that  in 
other  situations,  for  example,  homoscedastic  regression,  bounding  the 
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1  -  Introduction 

Heteroscedastic  linear  models  occur  frequently  in  practice,  in  such 
fields  as  radioimmunoassay  (Finney  (1976)  and  Rodbard  and  Frazier  (1975)), 
chemical  kinetics  (Box  and  Hill  (1974)),  and  econometrics  (Hildreth  and 
Houck  (1968)).  A  number  of  recent  papers  are  concerned  with  modeling  the 
variance  as  a  parametric  function  either  of  the  mean  response  or  the  in¬ 
dependent  variables;  see,  for  example.  Box  and  Hill  (1974),  Carroll  and 
Ruppert  (1982) ,  and  Dent  and  Hildreth  (1977) . 

Modeling  the  heteroscedasticity  not  only  allows  the  statistician  to  un¬ 
derstand  better  the  nature  of  the  statistical  variability  in  the  data,  but 
also  to  estimate  the  mean  response  more  efficiently  by  using  estimated  var¬ 
iances  as  weights. 

In  this  paper  we  introduce  outlier-resistant  (or  "bounded-influence") 
estimation  methods  for  such  heteroscedastic  linear  models.  Outlier-resis¬ 
tant  methods  are,  of  course,  useful  when  the  error  distribution  is  non-nor¬ 
mal,  but  they  are  of  benefit  as  well  for  other  types  of  deviations  from  the 
ideal  model.  For  example,  the  model  for  the  mean  response  or  for  the  variance 
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may  be  adequate  only  over  a  restricted  region  of  the  design  space.  At  the 
extreme  points  of  the  design  these  parametric  models  may  break  down;  this 
is  a  particularly  dangerous  situation  since  these  so-called  high- leverage 
points  are  extremely  influential  in  determining  the  values  of  the  estimators 
unless  "bounded- influence"  estimators  are  used.  Also,  an  occasional  value 
of  an  independent  variable  may  be  grossly  in  error,  for  example,  due  to  a 
recording  error,  though  otherwise  the  x-variables  are  assumed  to  be  without 
error  -  we  are  not  thinking  of  the  errors-in-variables  problem  where  the  in¬ 
dependent  variables  are  measured  with  errors  •  Here,  again,  the  use  of  tra¬ 
ditional,  unbounded- influence  estimators  is  risky  since  a  disastrously 
wrong  x-value  is  likely  to  have  high  leverage. 

The  preceeding  motivation  for  using  bounded- inf luence  methods  is  as 
compelling  for  homoscedastic  models  as  for  heteroscedastic  models,  and 
Mallows  (1975),  Hampel  (1978),  and  Krasker  and  Welsch  (1982)  have  already 
begun  the  development  of  outlier-resistant  methodologies  for  the  homosced¬ 
astic  case. 

Here  we  extend  the  work  of  Mallows  (1975)  and  Krasker  and  Welsch  (1982) 
to  heteroscedastic  models.  We  begin  by  considering  an  example,  reported 
by  Leurgans  in  the  Biostatistics  Casebook  (1980) .  The  data  are  concerned 
with  the  comparison  of  a  test  method  and  a  reference  method  for  measuring 
glucose  concentration  in  blood,  and  consist  of  46  pairs  of  measurements 
( xi  » y"i )  »  x  being  the  reference  and  y  the  test  method.  A  scatter  plot  of  y 
against  x  for  these  data  reveals  a  pronounced  linear  trend.  However,  a  plot 
of  the  least  squares  residuals  against  the  independent  variable,  x,  gives 
a  clear  indication  of  heteroscedasticity ,  the  variances  tending  to  increase 
with  the  value  of  x. 

On  encountering  such  heterogeneity  of  variance  in  the  data,  there  are 
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two  traditional  approaches;  to  transform  the  data,  or  to  perform  a  weighted 
least  squares  type  of  analysis.  Leurgans  chose  to  perform  a  log  transforma¬ 
tion,  fitting  the  model 

Jin  y^  =  a'  +  B'  Jin  x^  +  e^  (1.1) 

While  Leurgans'  choice  of  transformation  is  certainly  reasonable  for  these 
data,  in  many  applied  situations  one  might  be  reluctant  to  transform,  since 
transformation  can  make  it  difficult  to  make  inference  in  the  original  scale. 
In  such  cases,  a  weighted  analysis  may  be  preferable.  We  shall  consider 
weighted  analyses,  assuming  an  underlying  model  of  the  form 

yi  =  “  +  e  xi  +  °iei’  U-2) 

for  the  glucose  data,  where  the  { e ^ }  are  independent  and  identically  distri¬ 
buted  disturbances  and  the  (ck)  reflect  the  heterogeneity  of  variance  in  the 
model.  The  optimal  normal  theory  weighted  analysis  of  the  data  would  involve 
basing  inference  on  the  transformed  variables  y?  =  y^/o^,  =  x^/o^.  Since 

the  cr^'s  are  typically  unknown,  in  practice  one  will  work  instead  with  y? 

=  y^/cK,  x|  =  x./o.,  where  ck  is  an  estimate  of  cn  .  Clearly,  for  efficient 
inference,  one  would  like  to  use  a  good  estimate  of  the  o^'s.  In  the  ab¬ 
sence  of  replication,  the  usual  approach  is  to  assume  some  parametric  model 
for  the  a  's;  see  Carroll  (1982)  for  an  alternative.  For  the  Leurgans  data 
we  modelled  the  variance  in  the  following  way: 

=  alxi|A»  (1-3) 

somewhat  analogous  to  a  model  used  by  Box  and  Hill  (1974),  although  in 
their  model,  the  variance  is  a  function  of  the  mean. 

A  number  of  estimation  procedures  have  been  proposed  for  models  similar 
to  that  described  by  (1.3).  Typically,  a  preliminary  estimate  of  8  is  ob¬ 
tained,  and  the  residuals  from  this  preliminary  fit  are  used  to  gain  infor- 
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mation  about  the  variances.  This  information  is  then  used  to  obtain  a 
more  efficient  estimate  of  g.  For  example.  Box  and  Hill  (1974)  and  Jobson 
and  Fuller  (1980)  both  suggest  a  form  of  generalized  least  squares,  hereafter 
denoted  GLS.  If  one  assumes  a  normal  error  distribution,  one  can  proceed 
in  the  following  three  stages: 

(i)  Obtain  a  preliminary  unweighted  least  squares  estimate  of  g,  sav  g  , 

P 

(ii)  Obtain  a  maximum  likelihood  estimate  of  the  variance  parameters 

(o,X)  pretending  that  g  =  g  ,  and  treating  the  residuals  as  if  they 
were  the  true  errors,  and  form  estimated  standard  deviations  a.  = 
a|x.|^  . 

(iii)  Repeat  stage  (i)  with  the  transformed  variables  yt  =  y^/cr.  xi  = 

x. /a. . 
l  l 

When  applied  to  the  data  set  in  our  example,  this  generalized  least 
squares  estimation  procedure  yielded  a  value  of  .52  for  the  estimate  of  A. 

In  her  analysis  of  the  log- transformed  data,  Leurgans  identified  observa¬ 
tion  #31  (the  point  with  the  lowest  x-  and  y-value)  as  a  massive  outlier, 
and  deleted  it  from  her  analysis.  On  deletion  of  this  point  from  the 
data  set,  the  GLS  estimation  procedure  above  gave  a  value  of  .86  for  the 
estimate  of  A.  A  change  of  this  size  in  the  parameter  estimate,  on  dele¬ 
tion  of  a  single  point,  is  disturbing  and  indicates  an  undesirable  sensi¬ 
tivity  of  the  maximum  likelihood  method  to  a  small  fraction  of  the  data. 

In  the  homoscedastic  regression  case,  two  general  approaches  to  this 
kind  of  problem  have  been  considered: 

(i)  the  influence  approach,  where  the  focus  is  on  identifying  those 
points  in  the  data  set  which  have  a  large  influence  on  inferences 


page  5 


drawn  from  the  data;  see  Belsley,  Kuh  and  Welsch  (1980)  and  Cook 
and  Weisberg  (1985) . 

(ii)  developing  new  estimation  techniques  which  automatically  limit  the 
influence  of  any  small  subset  of  the  data  on  parameter  estimates. 

In  our  experience,  the  two  approaches  complement  rather  than  compete 
with  one  another.  Surprisingly  little  has  been  done  to  make  the  two  ap¬ 
proaches  applicable  for  heteroscedastic  regression.  We  shall  pursue  the 
second  approach  by  developing  bounded  influence  estimators  for  the  vari¬ 
ance  parameter  0  in  heteroscedastic  models.  We  emphasize  that  we  are  not 
interested  in  developing  a  "black-box"  estimation  procedure  -  we  would  like 
our  estimators  to  provide  diagnostic  information  about  influential  points, 
and  to  add  to  our  understanding  of  the  structure  of  the  data 

In  Section  2  we  show,  by  considering  the  influence  function,  why  the 
MLE  for  the  variance  parameter  is  so  sensitive  to  outliers,  and  we  describe 
an  estimator  which  partially  alleviates  this  sensitivity.  We  see  that,  an¬ 
alogously  to  the  homoscedastic  regression  case,  it  is  not  just  outliers  in 
the  response  which  exert  a  large  influence,  but  also  outliers  in  the  design, 
or  high  leverage  points. 

In  Sections  3  and  4  we  consider  two  classes  of  estimators  for  the  variance 
parameter  which  alleviate  the  sensitivity  of  the  estimation  procedure  to  ex¬ 
treme  data  points.  In  Section  5  the  performance  of  these  estimators  is  eval¬ 
uated  on  a  number  of  data  sets,  and  their  computation  is  discussed  in  the 
appendix. 

2.  The  model 

In  what  follows  we  consider  the  heteroscedastic  linear  model 

y.  =  x. 'B  +  a.e. ;  i=l, . . . ,n 
i  i  ii 


(2.1) 


where  0  is  a  pxi  vector  of  regression  parameters  to  be  estimated. 


x.  is  a 
1 

PX1  vector,  and  the  are  independent  with  a  distribution  F  which  is 

assumed  to  be  symmetric  with  mean  0  and  variance  1.  The  quantities  are 
the  variances  of  the  y^  and  reflect  the  heterogeneity  of  variance  in  the 
model . 

A  common  situation  in  practice  is  that  cr  is  assumed  to  be  some 
known  function  of  the  explanatory  variables.  A  very  flexible  model  for  this 
situation  is 

a.  =  exp[h'(xi)0],  (2.2) 

where  h  is  a  known  1R  ^-valued  function  and  9  is  an  unknown  q*l  vector  of 
variance  parameters.  Note  that  (2.2)  includes  the  power  model  a.=  a|x.|^ 
of  the  previous  section,  by  taking 


h(x) 


In  many  cases,  may  reasonably  be  modelled  as  a  function  of  the  mean  re¬ 
sponse,  T.,  which  we  might  express  as 


ai  =  exp(h’(Ti)0]  (2.3) 

analogously  to  (2.2).  The  techniques  developed  below  for  the  model  in  (2.2) 
extend  readily  to  that  given  by  (2.3).  Details  may  be  found  in  Giltinan 
(1983).  For  brevity,  we  shall  discuss  only  the  first  model  in  this  article. 
We  shall  be  concerned  with  obtaining  bounded  influence  estimates  of  the  var¬ 
iance  parameter  9. 

To  do  this,  it  proves  helpful  to  investigate  why  the  maximum  likelihood 
estimator  of  0  described  above  is  so  sensitive  to  outliers.  Under  a  normal 
error  assumption,  if  6  were  known,  the  VLE  of  9  would  solve  the  equation 


page 


>vVB 


i=l 1  '•exp[h*  Cx±)  eMLEl 


-  l|  h(x^)  =  0. 


(2.4) 


The  influence  function  Tampel ;  1968,  1974)  for  the  maximum  likelihood  esti- 

? 

mator  is  proportional  to  (e“-l)h(x),  where  e  =  (y-x ' B) exp [-h ' (x) 9] ,  and  is 
thus  unbounded.  Since  the  influence  function  for  the  MLE  is  quadratic  in 
the  residual  e,  in  theory  a  point  with  a  sufficiently  large  residual  can 
have  an  arbitrarily  large  effect  on  the  maximum  likelihood  estimate  of  9. 
Carroll  and  Ruppert  (1982)  suggest  guarding  against  this  by  replacing  the 
term  in  braces  in  (2.4)  by  a  bounded  function  y(£) •  In  practice,  one  would 

/v 

also  replace  x^'8  by  x^'B  in  (2.4);  that  is,  they  suggest  solving 


n  y .  -  x .  '  3 
I  X  1  1  P  A 

i  =  l  jexp[h' (x^)  0] 


h(x.)  =  0 


(2.5) 


to  obtain  9.  They  suggest  a  choice  of  x  which  generalizes  the  classical 
Huber  Proposal  2  for  the  homoscedsatic  case.  Implementation  of  this  tech¬ 
nique  with  the  glucose  data  gave  estimates  of  A  as  0.72  and  0.87  based  on 
the  full  and  reuuced  data  sets  respectively.  Bounding  the  effect  of  a  large 
residual  certainly  narrows  the  gap,  but  not  by  as  much  as  one  might  like. 
Examination  of  the  influence  function  for  this  estimation  method  reveals 
why  this  is  the  case  -  it  is  proportional  to  x(£)h(x).  Choice  of  a  bounded 
X-function  limits  the  effect  of  large  e  on  the  estimate,  but  the  factor  of 
h(x)  may  still  lead  to  unbounded  influence.  A  moderate  residual  occurring 
in  conjunction  with  a  large  value  of  |h(x)[  may  still  have  quite  an  effect 

/v 

on  9.  Clearly,  if  one  wishes  to  protect  against  this,  the  estimator  for  9 
must  limit  not  only  the  effect  of  large  residuals,  but  also  of  large  values 


of  h(x) 


This  is  analogous  to  the  situation  in  the  usual  homoscedastic  regression 
case.  In  that  situation,  the  least  squares  estimate  of  6  is  the  solution  to 
n 

l  (yi-xi'B)x.  =  0  (2.6) 

i=l 

and  has  influence  function  proportional  to  r  x  ,  where  r  =  y-x'0.  The  influence 

function  will  be  large  in  absolute  value  if  either  |r|  is  large,  or  if  ||x|| 

is  large  -  i.e.  if  a  point  has  a  large  residual  or  high  leverage.  Various 

suggestions  have  been  made  in  the  recent  literature  on  how  to  modify  (2.6) 

to  obtain  efficient  bounded  influence  estimates  of  (3  (e.g.  Mallows,  1975; 

Maronna,  Bustos  and  Yohai,  1979;  Krasker,  1980;  Krasker  and  Welsch,  1982). 

Extension  of  these  methods  to  the  problem  of  estimating  9  is  discussed  in 

Giltinan  (1985).  In  this  article  we  focus  on  estimates  of  the  type  proposed 

by  Mallows  (1975)  and  by  Krasker  and  Welsch  (1982);  the  paper  by  Krasker 

and  Welsch  also  discusses  Mallows  estimators. 

One  approach  to  bounded  influence  estimation  proceeds  from  the  view 

that  one  is  interested  chiefly  in  limiting  the  sensitivity  of  the 

estimate  to  anomalous  data,  that  is,  in  bounding  y  :  =  sup  | IF (x,y) | ,  where 

(x,y) 

IF(x,y)  is  the  influence  function  at  a  point  (x,y) .  This  definition  is  not 
invariant  to  a  change  of  coordinate  system,  and  Krasker  and  Welsch  propose  two 
alternative  definitions  of  sensitivity  which  circumvent  this  lack  of  invariance 

Yt  :  =  sup  sup  IF(y}  I  =  sup  [IF(x,y)V_1IF(x,y)  ]\ 

(x,y)  A^O  ( A  *VA)  2  (x,y) 

and 

Y-:  =  sup  i x ' IF (x,y) | , 

(x,y) 

if  the  maximum  influence  a  point  can  have  on  its  own  fitted  value  is  of 
interest.  Hampel  (1978)  refers  to  y  as  the  "self-influence".  Stahel  (1981) 
calls  y?  the  "self-standardized  sensitivity",  because  the  estimator's  influ- 
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ence  function  is  normed  by  the  covariance  matrix  of  the  estimator  itself. 
y  ^  is  appropriate  if  one  is  chiefly  concerned  with  inferences  about  6; 

A 

A'IF(x,y)  is  the  influence  function  of  the  linear  function  A' 8  of  the  esti- 

i,  ~ 

mate  and  (A'VA)2  is  the  (asymptotic)  standard  deviation  of  A'8.  Therefore, 

A 

bounding  y^  insures  that  the  standardized  influence  of  A'8  is  below  a  cer¬ 
tain  bound  for  all  A. 

Y_  is  appropriate  when  one  is  more  concerned  with  estimating  the  response 
surface  at  the  observed  x's.  Using  Yj  rather  than  Y-,  leads  to  more  extreme 
downweighting  of  high  leverage  points;  this  can  be  understood  intuitively  by 
noticing  that  for  least-squares  estimation  where  IF(x,v)  is  proportional  to 
rx,  A'IF(x,y)  and  x' IF(x,y)  are,  respectively,  linear  and  quadratic  in  x. 
Thus,  the  amount  of  downweighting  needed  to  keep  Y2  or  bounded  is  of  or- 
der  1 1  x 1 1  ,  respectively,  ||x||  . 

In  practice,  we  had  most  success  when  working  with  y^.  We  judged  suc¬ 
cess  by  the  stability  of  the  overall  inference,  that  is,  how  little  infer¬ 
ences  were  changed  by  the  deletion  of  one,  or  a  few,  influential  points.  In 
what  follows,  therefore,  we  present  estimators  and  results  developed  with  a 

view  to  bounding  y,  =  sup  | x' IF(x,y ;3)  |  in  the  regression  case,  and  the  anal- 

(x,y) 

ogous  quantity  y  =  sup  |h* (x) IF(e,x;9) |  in  the  variance  estimation  case. 

J  (e,x) 

We  remark  that  the  methods  generalize  easily  to  cover  the  case  of  boundirg 
y^  or  y?,  see  e.g.  Giltinan  (1983). 

Our  success  when  estimating  8  with  methods  that  bound  y^  lead  us  to  con¬ 
jecture  that  bounding  y  should  be  tried  in  other  problems  as  well,  e.g.  homo- 
scedastic  regression.  This  may  possibly  lead  to  progress  on  the  "masking 
problem"  where  two  or  more  highly  influential  points  mask  each  other. 

5.  Joint  influence  estimates 

For  technicai  convenience,  throughout  this  and  the  next  section  it  will 
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be  useful  to  treat  the  design  points  { }  as  if  they  were  a  sample  from  a 
distribution  function  H  and  independent  of  the  disturbances (e^ } .  In  practice, 
our  estimators  are  equally  appropriate  for  fixed  and  random  {x^},  but  theor¬ 
etical  properties  are  more  easily  discussed  for  random  design  vectors. 

In  the  homoscedastic  case,  Krasker  and  Welsch  (1982)  proposed  estimating 
3  by  solving  an  equation  of  the  type 

j  "(y-.x^x  (y.-x  *6^)  =  0  (3.1) 

i=l 

where  w(y,x)  is  a  non-negative  bounded  continuous  weight  function  which  depends 
on  y  only  through  the  absolute  residual,  |r|  =  |y-x'g|/cr,  and  therefore  for 
symmetrically  distributed  e  gives  Fisher  consistency  at  the  normal  model  for 
each  x  in  IR^: 

E^  w(e,x;8)e  =  0.  (3.2) 

The  influence  function  for  the  estimator  solving  (3.1)  is  given  by 

IF  (y,x;g)  =  w(y,x)  M^xfy-x'B), 

W  w 

where 

M  =  E [ w(y , x)  (^~ ^)2  xx’]. 
w  o 

For  this  class  of  estimators  of  g, 

y,  =  sup  | x ' I F  (y,x;B)|  =  sup  w(y,x)  x'M  /x  |y-x'g|. 

*  (x,y)  w  (x,y) 

Clearly,  the  quantity  y^  maf  t>e  bounded  by  suitable  choice  of  w.  Krasker  and 
Welsch  address  the  question  of  choosing  w  to  bound  y^  in  an  efficient  manner. 
The  weight  function  which  they  suggest  has  the  intuitively  appealing  form: 
downweight  an  observation  only  if  its  influence  would  otherwise  exceed  the  max¬ 
imum  allowable  influence,  else  give  the  observation  a  full  weight  of  one.  That 
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is,  if  is  not  to  exceed  the  bound  a,,  then  the  appropriate  weight  function 
satisfies 


w (y,x;8) 


Min  1, 


a_ 


y-x’  8 
o 


jx'M 


-1 

w 


XJ 


(3.3) 


Such  a  weight  function  will  downweight  an  observation  which  has  a  large  resid¬ 


ual,  or  high  leverage.  As  discussed  in  the  last  section,  for  fixed 


y-K 


Si 


w(y,x;8)  is  of  order  llxl 


-2 


as  |  |  x  1  j  -*■  00  . 

We  may  extend  the  method  of  Krasker  and  Welsch  to  cover  the  estimation  of 
the  variance  parameter  0  as  follows.  Consider  the  class  of  estimators  of  9 
obtained  by  solving  an  equation  of  the  form 
n 


l  w(y. ,t. ;0) [(e. (0))2-l]h(x  )  =  0 
i=l 


(3.4) 


where  w  is  a  bounded,  continuous  non-negative  weight  function,  depending  on 
y  only  through  the  residual  e(9)  =  (y-x)exp[-h' (x)0] .  The  influence  function 
for  0  is  given  by 

IF  (y,x;0)  =  w(y,x;0)  M^hCx)  fe2-l)  (3.5) 

W  W 

where 

Mw  =  E{w(s,x)  (e2-l)2  h(x)h'(x)}. 

Thus,  for  this  class  of  estimators  of  0, 


Y 


3 


=  sup  |h ' (x) IF  (e ,x;9) |  *  sup  h ' (x)M~1h(x)w(e,x) | e2-l | . 
(e,x)  w  (e,x) 


By  analogy  to  the  homoscedastic  regression  case,  if  one  wishes  to  choose  w 
subject  to  a  bound  a_  on  y^,  then  a  reasonable  choice  of  w  is  as  follows: 


w(e,x;0) 


r  a" 

Min  1,  — - - - - 7- 

1-  |e  -1 ! h '  (x)M*1h(x)J 


(3.6) 
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The  similarity  with  (3,5)  is  clear.  It  may  be  shown  that  under  normality 


of  e  , 


r  r  a3  i 

■1  =  E  g  - Z- -  h(x)h'Cx) 

L  4i' (x)M_  Vx)-'  J 


(3.7) 


L-  Mi'(x)M  Ah(x)^  J 

w 

where  g(u) :  =  E^[Min| z“-l | ,u]  |z  -l|. 

The  methods  suggested  above  may  be  combined  into  a  single  estimation  pro¬ 
cedure,  the  details  for  which  are  given  in  the  appendix.  Since  the  goal  has 
been  to  bound  influence  simultaneously  over  the  design  and  the  residuals,  we 
call  this  a  joint  influence  estimate.  For  the  power  model  (1.1)  applied  to 
the  glucose  data,  the  joint  influence  estimate  of  X  changed  only  from  0.84  to 
0.87  with  the  deletion  of  observation  31.  If  observation  31  is  included,  then 
its  weight  (3.6)  equals  0.07,  strongly  supporting  Leurgans '  deletion  of  the 
point . 


4.  Separate  influence  estimates 


A  second  approach  which  we  have  found  useful  in  estimating  0  is  to  han¬ 
dle  high  leverage  points  and  outliers  separately.  We  do  this  by  adapting 
an  idea  of  Mallows,  see  Krasker  and  Welsch  (1982)  .  Specifically,  if  the 
mean  =  x'^B  were  known,  then  a  Mallows-type  estimator  for  0  would  solve 


l  h  w(h  )X  - — 4-  =  0, 

i=l  ^exp(h^9)' 


(4.1) 


where  h^  =  h(x^),  0<w(lu)  <  1  is  a  weight  function  and  x(’)  is  an  even  functic 

2 

with  mean  zero.  If  wfh.l  =  1  and  x(v)  =v  -1*  the  solution  to  (4.1)  is  the  ma: 

A 

imum  likelihood  estimate.  The  influence  function  for  0  solving  (4.1)  is 


IF(e,x,0)  =  Mw  h  w  (h)  x(0 /E(ex(e) ) , 


Mw  =  E [w (h) h  h '  ]  . 


where 
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The  asymptotic  covariance  of  0  is 


where 


V(w,X)  «  Mj1NwMj1EX2(£)/{E£X(e)}2, 


Nw  =[E  w  (h)h  h'  ], 


so  that  the  self-influence  is 


Y  =  sup J h'  IF (e , x, 0)  |  =  sup  h'M^h  sup  x(£)/E{£  X  (£)  } 


Note  that  both  the  covariance  matrix  and  the  self-influence  are  products  of 
functions  of  (h,w)  and  (e,x).  To  bound  the  self-influence  efficiently,  we 
follow  the  approach  of  Maronna,  Bustos  and  Yohai  (1979)  in  the  homoscedastic 
case  and  split  the  problem  into  two  parts,  minimizing 

M  M  1  and  Ex2(ej/{Eex(t) I" 

w  w  w 

subject  to  bounds  on 

sup  h (x) '  M  ^(x)  and  sup  xCO/EUxU)  b 
x  e 

respectively.  By  analogy  with  the  previous  section,  we  suggest  choosing 
w(x)  =  Min[l .a/x'M^x] , 


where 


Mw  =  E{Min [ 1 ,a/x' Mw  xj  x  x' } 


and  in  practice  we  estimate  M  by  solving 

-1  N  -1 
M  =  n  J  Min[l,a/x'.M  x.  ]  x.x! 
w  l  w  1  11 

The  estimated  design  weights  are 

w.  =  w(x;)  =  Min  [1  ,a/x!M_  *x;  ] . 


(4.2) 


The  effect  of  large  residuals  may  be  controlled  by  choice  of  the  function  x  - 

2 

one  possibility  is  to  take  x(*)  =  $  (*)~C>  where  p,  is  a  Huber  psi-function 

k  K 

and  C  is  a  constant  chosen  to  give  consistency  at  the  normal  model. 

The  techniques  describeu  in  this  section  may  be  combined  to  obtain  a 
bounded  influence  estimate  for  (3,0);  see  the  appendix  for  details.  A 
descriptive  term  for  the  process  is  separate  influence  estimates.  Since  the 
weights  (4.2)  depend  only  on  the  design,  they  serve  mostly  as  an  indicator 
of  leverage  in  the  values  (h(x^)}.  We  have  had  moderate  success  with  the 
following  indicator  of  large  residuals: 

t|;(ri)/ri,  ri  =  (yi-x^  3p)/exp(h'  (xi)6),  (4.3) 

where  ;>(•)  is  the  Huber  function 

<KX)  =  max(-c,min(x,c)) ,  (4.4) 

and  6p  is  a  preliminary  estimate  of  S  as  in  the  appendix.  When  applied  to 
the  glucose  data,  the  separate  influence  estimate  of  X  in  (1.1)  changed 
only  from  0.83  to  0.90  with  the  deletion  of  observation  31.  The  design 
weight  (4.2)  was  0.46,  with  residual  weight  (4.3)  given  by  .364. 

The  choice  of  tuning  constants  "a"  in  (4.2)  and  "c"  in  (4.4)  is  up  to 
the  user.  We  generally  vary  these  constants,  using  smaller  values  for  diag¬ 
nostic  purposes  and  larger  values  for  inference.  In  this  paper,  to  save 
space,  we  have  chosen  fixed  values  which  represent  a  compromise  between 
the  two  goals. 
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5 .  Examples 

(1)  Glucose  data:  Table  5.1  summarizes  the  parameter  estimates  for  the 
Leurgans  data  set,  obtained  by  each  of  the  four  methods  described  above.  The 
gain  in  stability  of  the  parameter  estimates  by  using  bounded  influence  meth¬ 
ods  is  evident  from  the  table.  The  price  exacted  is  that  robust  estimators 
are  asymptotically  less  efficient  at  the  normal  model.  While  in  this  ex¬ 
ample  the  robust  methods  do  seem  to  have  higher  estimated  standard  errors 
for  the  full  data,  with  the  outlier  included,  they  are  probably  better  than 
the  optimistic  GLS  estimates. 

Since  predicting  Y  from  X  is  a  primary  concern  for  these  data,  the 
widths  of  prediction  intervals  are  of  interest.  We  therefore  computed  such 
prediction  intervals  for  a  mean  at  various  values  in  the  range  of  X,  using 
each  of  the  four  methods,  and  investigated  the  change  upon  deleting  observa¬ 
tion  #31.  The  results  are  summarized  in  Table  5.2,  which  gives  ratios  of 
confidence  interval  lengths  for  different  values  of  X  for  the  respective 
estimation  methods. 

Again,  the  gain  in  stability  by  using  bounded  influence  methods  is  ob¬ 
vious  from  Table  5.2  The  stability  is  obtained  at  a  cost  -  the  prediction 
interval  lengths  for  the  joint  and  separate  influence  methods  show  an  increase 
typically  of  about  10-16%  over  those  obtained  by  maximum  likelihood  methods, 
when  operating  on  the  reduced  data.  This  is,  of  course,  in  agreement  with 
what  is  known  to  happen  exactly  at  the  normal  error  model. 

Both  the  joint  and  separate  influence  procedures  dov.nwcighted  observa¬ 
tion  #31  considerably  in  the  full  data  set.  In  this  example,  our  attention 
had  already  been  drawn  to  the  point;  however,  sometimes  the  weights  provided 
by  the  bounded  influence  methods  are  a  useful  diagnostic  tool  in  drawing  our 
attention  to  previously  unsuspected  influential  points.  This  is  illustrated 
by  our  next  example. 
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(2)  Gas  vapor  data:  Our  second  example  is  taken  from  Weisberg  (1980, 
page  146).  The  data  are  concerned  with  the  amount  of  gas  vapor  emitted  into 
the  atmosphere  when  gasoline  is  pumped  into  a  tank.  In  a  laboratory  experi¬ 
ment  to  investigate  this,  a  sequence  of  32  experimental  fills  was  carried 
out.  Four  variables  were  thought  to  be  relevant  for  predicting  Y,  the  quan¬ 
tity  of  emitted  hydrocarbons: 

x^  =  the  initial  tank  temperature,  in  °F 

x^  =  the  temperature  of  the  dispensed  gasoline,  in  °F 

x,  =  the  initial  vapor  pressure  in  the  tank,  in  psi 

x^  =  the  vapor  pressure  of  the  dispensed  gasoline,  in  psi 

This  data  set  has  been  used  by  Cook  and  Weisberg  (1983)  to  illustrate  their 
proposed  score  test  for  heteroscedasticity .  They  found  definite  evidence  of 
heteroscedasticity ,  with  the  variance  being  a  function  of  xx  and  x^,  and  ob¬ 
tained  this  empirical  estimate  of  the  direction  in  which  the  variance  is  in¬ 
creasing: 

Xj-  =  0.778  +  .110  x^  -  1.432  x^. 

A  plot  of  the  residuals  from  fitting  the  ordinary  least  squares  model  against 
x,-  shows  obvious  heteroscedasticity.  We  used  a  power  model  for  the  variance: 

°i  =  0 1 X5i+  °-5|X  ’  (5-1) 

adding  0.5  to  make  x^  +  0.5  >  0  in  all  observations,  and  analyzed  the  data 

using  our  three-stage  estimation  procedure.  We  employed  the  same  four  esti¬ 
mation  techniques  as  in  the  previous  example.  All  the  discussion  will  focus 

on  estimation  of  A,  although  the  estimation  of  8  after  appropriate  weighting 
is  interesting  and  nontrivial. 


Table  5.3  summarizes  the  estimates  of  A  and  associated  standard  errors, 
provided  by  the  different  estimation  methods.  For  these  data.  Cook  and  Weis- 
berg  note  that  for  unweighted  least  squares  there  are  no  unduly  influential 
cases.  Upon  performing  the  initial  analyses  on  the  full  data,  we  thus  were 

surprised  at  the  disparity  in  the  estimate  of  A  provided  by  the  different 
techniques.  It  turns  out  that  least  squares  estimation  of  g  and  generalized 
least  squares  estimation  of  A  operate  entirely  differently.  Closer  examina¬ 
tion  reveals  that  the  bounded  influence  methods  downweighted  observations 
#1  and  #2  considerably  in  the  estimation  of  A,  while  giving  close 
to  full  weight  to  all  other  data  (see  Table  5.3).  This  suggests  that  these 
two  points,  observation  #1  in  particular,  exert  considerable  influence  in 
determining  the  estimate  of  A.  This  is  borne  out  by  Table  5.3;  the  general¬ 
ized  least  squares  estimate,  which  makes  no  attempt  to  control  the  influence 
of  observations  #1  and  #2  in  the  full  data  set,  changes  considerably  on 
their  deletion.  The  bounded  influence  techniques,  on  the  other  hand,  control 
their  influence  in  the  full  data  set  and  are  relatively  insensitive  to  their 
deletion. 

Again,  we  find  that  the  bounded  influence  methods  provide  credible, 
stable  estimates  at  the  price  of  some  loss  of  efficiency.  This  example 
illustrates  their  use  also  as  a  diagnostic  method  for  locating  influential 
points.  We  stress  that  an  influential  observation  is  not  necessarily  'bad'  - 
it  may,  in  fact,  be  highly  informative.  We  do  feel,  however,  that  it  is 
of  value  to  be  able  to  identify  points  which  are  highly  influential.  The 
methods  proposed  in  this  article  provide  one  way  of  doing  this,  while  at 
the  same  time  providing  the  possibility  of  inference  based  on  the  well-known 


ideas  of  M-estimation. 
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Upon  reflection,  we  realized  that  much  of  the  difficulty  with  observa¬ 
tions  #1  and  #2  in  terms  of  estimating  A  is  due  to  the  model  (5.1).  For 
the  first  two  observations,  x^+0.5  is  very  nearly  zero  so  that  there  is 
a  huge  relative  disparity  among  the  values  of  {x^+0.5}  in  the  data.  Thus 
the  strong  influence  of  the  first  two  observations  on  estimating  A  is  some¬ 
what  artificial  and  can  be  weakened  by  using  x^^+1.0. 

This  example  is  a  good  illustration  of  the  power  of  our  techniques  to 
provide  stable  estimation  and  inference.  However,  from  the  view  of  model¬ 
ling  the  variances  it  is  somewhat  unsatisfactory  mainly  because  we  used  the 
linear  combination  x  as  given.  For  example,  one  might  consider  an  alter¬ 
native  variance  model 


a  +  ax  +  a.x  . 

0  1  li  2  2i 


(5.2) 


It  is  not  clear  how  best  to  adapt  our  techniques  to  the  model  (5.2),  and 
some  care  will  be  necessary.  This  data  set  is  actually  quite  a  difficult 
one  for  model  (5.2).  A  generalized  least  squares  estimate  gives  very  large 
weight  to  the  first  two  observations,  while  the  likelihood  appears  to  be 
unbounded. 


6 .  Conclusion 

We  have  introduced  new  methods  of  estimation  for  heteroscedastic  linear 
models  when  the  variances  are  a  power  of  the  mean  or  a  single  predictor. 
These  methods  are  simple  adaptations  of  the  ideas  of  bounded  influence  re¬ 
gression.  The  estimation  methods  should  serve  as  a  complement  to  influence 
techniques  such  as  the  graphs  of  Cook  and  Weisberg  (1983)  or  the  deletion 
diagnostic  ideas  of  Cook  (1985). 


Table  5.1  Glucose  data  estimates  of  X  for  the  model  (1.2),  with  estimated 
standard  errors  in  parentheses 


P 


Complete 

Data 

Reduced 

Data 

without  #31 

Generalized 

0.52 

0.86 

Least  Squares 

(0.18) 

(0.17) 

Carroll- 

0.72 

0.87 

Ruppert 

(0.19) 

(0.21) 

Joint  Influ- 

0.84 

0.87 

ence  Method 

(0.20) 

(0.19) 

#31  Weight  for  0 

.07 

#31  Weight  for  (3 

.04 

Separate  Influ- 

0.83 

0.90 

ence  Method 

(0.21) 

(0.20) 

#31  Weight  for  0 

.46 

#31  Weight  for  3 

.16 

Table  5.2  Ratio  of  length  of  mean  prediction  confidence  interval  for  the 
reduced  glucose  data  versus  the  complete  glucose  data. 


Value  of  x 

Generalized 
Least  Squares 

Carrol 1- 
Ruppert 

Joint  Influ¬ 
ence  Method 

Separate  Influ 
ence  Method 

60 

0.92 

1.05 

1.00 

0.98 

100 

0.87 

0.95 

0.96 

0.94 

180 

1.14 

1.05 

0.99 

0.99 

260 

1.20 

1.10 

1.00 

1.00 

380 

1.21 

1.12 

1.01 

1.01 

r 


Computation  of  the  bounded  influence  estimates 


For  completeness,  we  include  here  the  estimating  equations  for 
the  three-stage  bounded  influence  estimates  discussed  in  Sections  3  and  4. 

We  also  outline  the  algorithm  used  to  compute  the  three-stage  separate  influ 
ence  estimator  of  Section  4.  A  similar  algorithm  may  be  used  to  compute  the 

joint  influence  estimator  of  Section  3.  Details  are  given  in  Giltinan  (1983) 
The  three-stage  joint  influence  estimator  is  obtained  as  follows: 


Stage  1: 


Solve 


n  r 
l  Min  1, 
i=l  L 


y. -x. '6  -  i  J 

1  a* — 2-  x.'M  x. 

ill 


A 

-  (y. -x. '8  )x.  = 

1  i  p  i 


—  £  f  x— j —  x.x.  '  -  M  =0 

"  ih  L-mPx.J  11  1 


i  1  i 


n  rv.  -x.  '8  *r 

l  -  o 


simultaneously  for  M^,  and  8  ,  where  f(u)-  =  (Min | z | ,u) | z  and  x  is 

an  even  function  satisfying  Ex(e)  =  0. 

2  2 

We  used  v  of  the  form  x(*J  =  where  \p  is  a  Huber  psi- 

funct ion . 


Stage  2: 


Next  solve 


V  M  i  n  1 1 , 

i  =  1  L 


a7  Iff  Yz-t.  J2  1 

- - - —  {  - -Ihfx,) 

r  y.-t.  “1 2  ~  ,  -*  ^  1-exp[h'  (x • )  0 ] ^  J 

- -1 — - - x-  -1  h'  (x.)M*  h(x.) 

•exp[h '  (x; ) 6]'* 


i  I  gr — — -lhcx.jh'tx.)  -m,  = 

i=l  ih'(x^)M0  h(x^)-* 


for  and  6,  where  t.  =  x  '8  and  g  is  as  above. 

2  lip 


Stage  5: 

A  A 

Form  estimated  standard  deviations,  =  exp[h'(x^)0]  and  repeat 

/N  A, 

Stage  1,  using  the  transformed  variables  xt  =  x^/a  ,  y*  =  y^/a. . 

For  the  three-stage  separate  influence  procedure,  the  estimating 
equations  are  as  follows: 


Stage  1:  Solve 


n  r  a,  q  ■y. -x. 'a  ~ 

l  x  Min  1,  - : —  tj>  — — — 2-  = 

i ~  1  —  v  *M  x  ^  G  • 

xi  1  i  °1 

,  n  r  a,  ~i 

M  =  -  I  Min'  1 ,  - *-i- —  x.x.  ' 

1  n  •  ,  I  .,.-1  i  l 

i=l  b  x.  M,  x.J 


l  “1  l 


n  ry  -  x .  '  8 
V  „  ill 


a  -y 

l  x- 

i  =  l  L 


(A.  1) 


CA.2) 


(A. 3) 


simultaneously  for  8  »  and  a^,  where  $  is  a  nondecreasing  odd  function 
and  x  an  even  function  EyCe)  =0.  In  our  applications  we  used  a  Huber 
psi-function  in  (A.l)  and  y  as  in  Huber's  proposal  2  in  (A. 3) 


Stage  2:  Next  solve 


n  P  a2  1 

1  h(.x.)Min  1, - x— - 

i=l  1  [hf  (x,  )M_  h(x. )  ]- 


- 5n -  X  — — - H  =  0 

[h,(xi)M2  h(xi)]-1  iexp[h' (xi)0)-' 


y.-t. 

i  i 


(A.  4) 


'  ,  n  r  a 

l  Min  1, - ^ -  h(x.)h’(x. 

“  i=l  L  [h’Cxi)M2h(xi)]J 


(A.  5) 


for  0  and  NL,  where  t.  =  x. '6  . 

2  lip 


Stage  3: 


Form  estimated  standard  deviations,  =  expfh'Cx^B]  and  repeat 

A  A 

Stage  1,  using  the  transformed  variables  x*  =  x./a.  and  y*  =  y  /a  .  In 

l  ii  l  l  l 

solving  the  system  (A.1)-(A.3)  we  first  solved  (A. 2)  iteratively  to  obtain 

~  i  rn 

using  —  as  starting  value.  We  then  formed  estimated  weights, 

w^  =  Min[l,  aj/(x^'M  x^) ] ,  and  proceeded  to  solve  (A.l)  and  (A. 3)  by  means 
of  an  iteratively  reweighted  least  squares  algorithm  similar  to  that  described 
in  Huber  (1981,  pages  183-186).  The  equation  (A. 5)  may  be  solved  in  a  man¬ 
ner  similar  to  (A. 2),  and  estimated  weights  for  the  second  stage  computed. 

In  solving  (A. 4)  we  used  subroutine  ZXGSN  in  the  IMSL  library  and  assumed 
that  A  was  in  the  interval  (-2,2)  . 

The  truncation  values  a^  a2  and  a^  in  the  estimating  equations  still 
need  to  be  specified.  The  matrix  satisfies 


Mj  =  E  Mi 


ai 

in  1.  - -j— 

x'M.x- 


x  x '  , 


whence  it  follows  that 


r  ai  1  -1 

I  =  E  Min  1,  - —  xx'M. 

p*p  L  1 


x'Mj  xJ 


Taking  traces  across  this  equation: 


T  al  1  -1  -1 

P  =  E  Min  1,  - j — x’M  x  =  E  Min[x'M  x,a,  ]  , 

v  y-I  1  i 


x'Mx  x- 


so  that,  if  (A. 2)  is  to  have  a  solution,  a^  must  exceed  p.  Similarly,  q  is 
a  lower  bound  for  a2  We  have  had  good  success  in  setting  the  truncation 
values  at  one-and-a-half  times  the  lower  bound,  and  the  results  reported  in 


Section  5  above  used  this  cutoff  value. 


For  the  separate  inference  estimation  procedure,  formal  influence  cal¬ 
culations,  given  in  detail  in  Giltinan  (1983),  yield  the  following  expression 
for  the  influence  function  of  the  final  estimate  of  8: 


.-1 


IF(x,y;8)  =  M“A  x  Min  1, 


a,  i 


’Ky-x'g) 


x'M,  x->  E  i|>(e) 


The  asymptotic  covariance  matrix  of  8  is  given  by 


Var [/n  (8-8)]  =  M^N^1 

E  <Ke) 


where  N,  =  E  w“(x)  xx'  . 

A  A 

In  estimating  the  covariance  matrix,  we  estimated  by  M„,  by  = 
n  l  W3ixiXi '  *  and  the  scalar  quantity  Ei|>2 (e) /E2^(e)  by  i  £  ^2(ei)/[^  ^(ei)] 

A 

where  ei  =  yt  -  x?'8,  multiplying  in  practice  by  a  finite-sample  correction 
fprtor: 

1  +  rEll)  1  '  a 

'  n  ' 


-  compare  with  Carroll  and  Ruppert  (1982),  or  Huber  (1981,  page  173). 

Another  approach  to  computing  standard  errors  might  be  to  use  the  boot¬ 
strap.  While  this  is  a  reasonable  possibility,  at  this  point  it  is  not  clear 
how  best  to  implement  bootstrapping  in  the  presence  of  outliers  and  high 
leverage  points.  Future  work  in  this  area  is  clearly  needed. 

/A 

The  influence  function  for  8  is  given  by 


IF 


-1, 


(x,y;8)  =  h(x)  Min 


1, 


IX 


h'txJM^hfx)-! 


r  v-x'8 


exp[h 1 (x) 8 


i 


E(exOO) 


and  thus  0  has  asymptotic  covariance  matrix  given  by 


Var [i/n  (8-0)]  =  E[IF(x,y;8)  IF'(x,y;8)] 


=  N  -A  ^  —  ,  where  N?  =  Ew  (h(x) )h(x)h' (x) . 

EZ[eX(e) ] 

This  may  be  estimated  in  a  fashion  quite  analogous  to  that  used  to  estimate 
the  asymptotic  covariance  matrix  of  3-  Full  details  may  be  found  in  Giltinan 
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